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Docket No.: PF-0357-1 DIV 

REMARKS 

The claims have been amended to clarify the invention. In particular, claim 1 has been 
amended to delete a "biologically active fragment", and step d) of claim 1 has been amended to recite 
an immunogenic fragment of SEQ ID NO:l consisting of at least 15 contiguous amino acid residues of 
SEP ID NO:l . Support for this amendment is found in the specification, for example, at page 43, 
Example XI, which describes the identification and synthesis of suitable immunogenic fragments within 
SEQ ID NO:l and the fact that typically such fragments are at least 15 residues in length. Claims 1, 2 
and 4 have been amended to recite the amino acid sequence of SEQ ID NO:l. Claim 3 has been 
amended to delete the phrase "an effective amount of. No new matter is added by these amendments, 
and entry of the amendments is requested. 
Response Filed July 15, 2003 

The Examiner acknowledged applicants response to the Non-Responsive communication 
mailed June 20, 2003 indicating that the Non -Responsive communication was improper. 
Election/Restriction 

The Examiner acknowledged applicants election of claims 1-4 with traverse of the grounds that 
the composition of matter claims and claims to their, methods of use could be examined without undue 
burden. However, the Examiner stated that claims 8, 15, 16 and new claims 35-37 directed to process 
claims will be rejoined and examined pending allowance of product claims providing that the process 
claims depend from or otherwise include all the limitations of the patentable product. Applicants is also 
advised that process claims should be amended during prosecution either to maintain dependency on 
the product claims or otherwise include the limitation of the product claims. Failure to do so may result 
in a loss of right to rejoinder. 

Accordingly, cancellation of claims 11-14 and 21-34 is acknowledged, claims 5-10, 15-20 and 
35-37 are withdrawn as being drawn to a nonelected invention, and claims 1-4 are hereby examined. 
35 U.S.C. § 112, First Paragraph. Rejection of Claims 1-3 

The Examiner has rejected claims 1-3 under 35 U.S.C. § 112, first paragraph as failing to 
comply with the written description requirement because the claim(s) contains subject matter which was 
not described in the specification in such a way as to reasonably convey to one skilled in the relevant art 
that the inventor(s), at the time the application was filed, had possession of the claimed invention. 
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The Examiner stated that the claims are broadly drawn to naturally occurring amino acid 

sequences having 95% identity to any amino acid sequence within SEQ ID NO:l or the entirety of 
SEQ ID NO:l, and to a biologically active fragment or immunogenic fragment of SEQ ID NO:l and 
that such recitation encompasses an extremely large genus of mutants, variants, and homologs of SEQ 
IDNO:l. 

The Examiner stated that the specification teaches the amino acid sequence of SEQ ID NO:l 
and characterizes it by homology to a rat synaptojanin 28kDA isoform and various potential sequence 
motifs, but stated that the specification does not disclose or fully characterize any variants of these 
proteins which have the same biological activity or a different biological activity. The specification has 
not identified the actual biological activity or function of SEQ ID NO:l and thereby encompass 
polypeptides having any biological activity. The Examiner then cited various publications in support of 
an allegation that sequence comparisons, alone, are not predictive of protein function. See, in 
particular, Bork (2000); Van de Loo et al. (1995); Seffernick et al. (2001); and Broun et al. (1998), all 
cited at page 5 of the Office Action. 

The Examiner then cited various court decisions related to adequate fulfillment of the written 
description requirement; in particular, Vas-Cath Inc. v. Mahurkar (applicant must convey with 
reasonably clarity to those skilled in the art that, as of the filing date sought, he or she was in possession 
of the invention); Fiers v. Revel and Amgen Inc. V. Chugai Pharmaceutical Co. Ltd (Adequate 
written description requires more than a mere statement that it is a part of the invention and reference to 
a method for isolating it. The nucleic acid is required). The Examiner further cited University of 
California v. Eli Lilly and Co. 

Therefore, the Examiner stated, only SEQ ID NO:l, but not the full breadth of the claims meet 
the written description provision of 35 U.S.C. 112, first paragraph. The species specifically disclosed 
are not representative of the genus because the genus is highly variant. Applicant is reminded that Vas- 
Cath makes clear that the written description provision of 35 U.S.C. 1 12 is severable from its 
enablement provision. 
Applicants Response 

Claim 1 has been amended to delete the recitation of a "biologically active fragment", and 
claims 2 and 3 no longer depend from a claim reciting that limitation. Claim 1 has been further 
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amended at step b) to recite a naturally occurring variant of the amino acid sequence of SEQ ID NO:l, 

and at step d) to recite an immunogenic fragment of SEQ ID NO:l having at least 15 contiguous 

residues. Therefore, the claim does not encompass "any" amino acid sequence within SEQ ID NO: 1. 

Applicants disagree that the claims, as amended, lack of an adequate written description in accordance 

with 35 U.S.C. 112, first paragraph, particularly for the claimed variants of SEQ ID NO:l. Applicants 

agree, however, that the requirements necessary to fulfill the written description requirement of 35 

U.S.C. 112, first paragraph, are well established by case law, some of which has been cited by the 

Examiner. Applicants merely disagree with the Examiner's interpretation of these citations as they 

apply to the instant case. For example, Vas-Cath states; 

... the applicant must also convey with reasonable clarity to those skilled in 
the art that, as of the filing date sought, he or she was in possession of the invention. 
The invention is, for purposes of the "written description" inquiry, whatever is now 
claimed. Vas-Cath, Inc. v. Mahurkar, 19 USPQ2d 1111, 1117 (Fed. Cir. 1991) 

Attention is also drawn to the Patent and Trademark Office's own "Guidelines for Examination 
of Patent Applications Under the 35 U.S.C. Sec. 112, para. 1", published January 5, 2001, which 
provide that : 

An applicant may also show that an invention is complete by disclosure of sufficiently 
detailed, relevant identifying characteristics which provide evidence that applicant was in 
possession of the claimed invention, i.e., complete or partial structure, other physical 
and/or chemical properties, functional characteristics when coupled with a known or 
disclosed correlation between function and structure, or some combination of such 
characteristics. What is conventional or well known to one of ordinary skill in the art 
need not be disclosed in detail. If a skilled artisan would have understood the inventor 
to be in possession of the claimed invention at the time of filing, even if every nuance of 
the claims is not explicitly described in the specification, then the adequate description 
requirement is met. 

Thus, the written description standard is fulfilled by both what is specifically disclosed and what 
is conventional or well known to one skilled in the art. 

SEQ ID NO: 1 is specifically disclosed in the application (see, for example, page 2, lines 28- 
29). Variants of SEQ ID NO:l are described, for example, at page 11, lines 16-23. In particular, the 
preferred, more preferred, and most preferred variants of SEQ ID NO:l (80%, 90%, and 95% amino 
acid sequence identity to SEQ ID NO:l) are described, for example, at page 12, lines 26-30. Incyte 
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clones in which the nucleic acids encoding the human NYSN-1 were first identified and libraries from 

which those clones were isolated are described, for example, at page 11, lines 29-33 of the 
Specification. Chemical and structural features of NYSN-1 are described, for example, on page 12, 
lines 1-20. Given SEQ ID NO:l, one of ordinary skill in the art would recognize naturally-occurring 
variants of SEQ ID NO:l having 95% sequence identity to SEQ ID NO:l. Accordingly, the 
Specification provides an adequate written description of the recited polypeptide sequences. 

A. The Specification provides an adequate written description of the claimed "variants" of 
SEQ ID NO:l. 

The Office Action has further asserted that the claims are not supported by an adequate written 
description because 

The specification does not disclose or fully characterize any variants of these proteins 
which have the same biological activity or a different biological activity 

(page 4 of the Office Action of 10/22/2003) 

Such a position is believed to present a misapplication of the law. 

1. The present claims specifically define the claimed genus through the recitation 
of chemical structure 

Court cases in which "DNA claims" have been at issue (which are hence relevant to claims to 

proteins encoded by the DNA and antibodies which specifically bind to the proteins) commonly 

emphasize that the recitation of structural features or chemical or physical properties are important 

factors to consider in a written description analysis of such claims. For example, in Fiers v. Revel, 25 

USPQ2d 1601, 1606 (Fed. Cir. 1993), the court stated that: 

If a conception of a DNA requires a precise definition, such as by structure, formula, 
chemical name or physical properties, as we have held, then a description also requires 
that degree of specificity. 

In a number of instances in which claims to DNA have been found invalid, the courts have 
noted that the claims attempted to define the claimed DNA in terms of functional characteristics without 
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any reference to structural features. As set forth by the court in University of California v. Eli Lilly 

and Co., 43 USPQ2d 1398, 1406 (Fed. Cir. 1997): 

In claims to genetic material, however, a generic statement such as "vertebrate insulin 
cDNA" or "mammalian insulin cDNA," without more, is not an adequate written 
description of the genus because it does not distinguish the claimed genus from others, 
except by function. 

Thus, the mere recitation of functional characteristics of a DNA, without the definition of 
structural features, has been a common basis by which courts have found invalid claims to DNA. For 
example, in Lilly, 43 USPQ2d at 1407, the court found invalid for violation of the written description 
requirement the following claim of U.S. Patent No. 4,652,525: 

1. A recombinant plasmid replicable in procaryotic host containing within its nucleotide 
sequence a subsequence having the structure of the reverse transcript of an mRNA of a 
vertebrate, which mRNA encodes insulin. 

In Fiers, 25 USPQ2d at 1603, the parties were in an interference involving the following count: 

A DNA which consists essentially of a DNA which codes for a human fibroblast 
interferon-beta polypeptide. 

Party Revel in the Fiers case argued that its foreign priority application contained an adequate 
written description of the DNA of the count because that application mentioned a potential method for 
isolating the DNA. The Revel priority application, however, did not have a description of any particular 
DNA structure corresponding to the DNA of the count. The court therefore found that the Revel 
priority application lacked an adequate written description of the subject matter of the count. 

Thus, in Lilly and Fiers, nucleic acids were defined on the basis of functional characteristics 
and were found not to comply with the written description requirement of 35 U.S.C. §112; i.e., "an 
mRNA of a vertebrate, which mRNA encodes insulin" in Lilly, and "DNA which codes for a human 
fibroblast interferon-beta polypeptide" in Fiers. In contrast to the situation in Lilly and Fiers, the 
claims at issue in the present application define polypeptides in terms of chemical structure, rather than 
on functional characteristics. For example, the "variant language" of independent claim 1 recites 
chemical structure to define the claimed genus: 
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1. An isolated polypeptide comprising an amino acid sequence selected from the group 
consisting of:...b) a naturally-occurring amino acid sequence having at least 95% 
sequence identity to the amino acid sequence of SEQ ID NO:l... 

From the above it should be apparent that the claims of the subject application are 
fundamentally different from those found invalid in Lilly and Fiers. The subject matter of the present 
claims is defined in terms of the chemical structure of SEQ ID NO:l. In the present case, there is no 
reliance merely on a description of functional characteristics of the polypeptides recited by the claims. 
In fact, there is no recitation of functional characteristics. Moreover, if such functional recitations were 
included, it would add to the structural characterization of the recited polypeptides. The polypeptides 
defined in the claims of the present application recite structural features, and cases such as Lilly and 
Fiers stress that the recitation of structure is an important factor to consider in a written description 
analysis of claims of this type. By failing to base its written description inquiry "on whatever is now 
claimed," the Office Action failed to provide an appropriate analysis of the present claims and how they 
differ from those found not to satisfy the written description requirement in Lilly and Fiers 

2. The present claims do not define a genus which is "highly variant" 

Furthermore, the claims at issue do not describe a genus which could be characterized as 
"highly variant." Available evidence illustrates that the claimed genus is of narrow scope. 

In support of this assertion, the Examiner's attention is directed to the enclosed reference by 
Brenner et al. ("Assessing sequence comparison methods with reliable structurally identified distant 
evolutionary relationships," Proc. Natl. Acad. Sci. USA (1998) 95:6073-6078; attached as Exhibit A). 
Through exhaustive analysis of a data set of proteins with known structural and functional relationships 
and with <90% overall sequence identity, Brenner et al. have determined that 30% identity is a reliable 
threshold for establishing evolutionary homology between two sequences aligned over at least 150 
residues. (Brenner et al., pages 6073 and 6076.) Furthermore, local identity is particularly important in 
this case for assessing the significance of the alignments, as Brenner et al. further report that ^40% 
identity over at least 70 residues is reliable in signifying homology between proteins. (Brenner et al., 
page 6076.) 
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The present application is directed, inter alia, to synaptojanin proteins related to the amino 
acid sequence of SEQ ID NO:l. In accordance with Brenner et al, naturally occurring molecules may 
exist which could be characterized as synaptojanin proteins and which have as little as 40% identity 
over at least 70 residues to SEQ ID NO:l. The "variant language" of the present claims recites, for 
example, polypeptides encoding "a naturally-occurring amino acid sequence having at least 95% 
sequence identity to the sequence of SEQ ID NO:l" (note that SEQ ID NO:l has 305 amino acid 
residues). This variation is far less than that of all potential synaptojanin proteins related to SEQ ID 
NO:l, i.e., those synaptojanin proteins having as little as 40% identity over at least 70 residues to SEQ 
IDNO:l. 

3. The state of the art at the time of the present invention is further advanced than 
at the time of the Lilly and Fiers applications 

In the Lilly case, claims of U.S. Patent No. 4,652,525 were found invalid for failing to comply 
with the written description requirement of 35 U.S.C. §112. The '525 patent claimed the benefit of 
priority of two applications, Application Serial No. 801,343 filed May 27, 1977, and Application Serial 
No. 805,023 filed June 9, 1977. In the Fiers case, party Revel claimed the benefit of priority of an 
Israeli application filed on November 21, 1979. Thus, the written description inquiry in those case was 
based on the state of the art at essentially at the "dark ages" of recombinant DNA technology. 

The present application has a priority date of July 31, 1997. Much has happened in the 
development of recombinant DNA technology in the 18 or more years from the time of filing of the 
applications involved in Lilly and Fiers and the present application. For example, the technique of 
polymerase chain reaction (PCR) was invented. Highly efficient cloning and DNA sequencing 
technology has been developed. Large databases of protein and nucleotide sequences have been 
compiled. Much of the raw material of the human and other genomes has been sequenced. With these 
remarkable advances one of skill in the art would recognize that, given the sequence information of 
SEQ ID NO:l, and the additional extensive detail provided by the subject application, the present 
inventors were in possession of the claimed polypeptide variants at the time of filing of this application. 

4. Summary 

The Office Action failed to base its written description inquiry "on whatever is now claimed." 
Consequently, the Action did not provide an appropriate analysis of the present claims and how they 
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differ from those found not to satisfy the written description requirement in cases such as Lilly and 
Fiers. In particular, the claims of the subject application are fundamentally different from those found 
invalid in Lilly and Fiers. The subject matter of the present claims is defined in terms of the chemical 
structure of SEQ ID NO:l. The courts have stressed that structural features are important factors to 
consider in a written description analysis of claims to nucleic acids and proteins. In addition, the genus 
of polypeptides defined by the present claims is adequately described, as evidenced by Brenner et al 
and consideration of the claims of the '740 patent involved in Lilly. Furthermore, there have been 
remarkable advances in the state of the art since the Lilly and Fiers cases, and these advances were 
given no consideration whatsoever in the position set forth by the Office Action. 

For all of the above reasons, applicants submit that the claims, as amended, are fully in 
compliance with the written description requirement as set forth in 35 U.S.C. § 112, first paragraph, 
and therefore request withdrawal of the rejection of claims 1-3 under this paragraph. 
35 U.S.C. § 1 12, First Paragraph, Rejection of Claims 1-4 

The Examiner has rejected claims 1-4 under 35 U.S.C. § 112, first paragraph, as containing 
subject matter which is not described in the specification in such a way as to enable one skilled in the 
art to which it perrtains, or with which it is most nearly connected, to make and/or use the invention. 

The Examiner stated that the claims are not enabled for "biologically active fragments" of the 
polypeptide SEQ ID NO:l (NYSN-1) with characteristics of the instant claims. The specification has 
not taught or demonstrated those sequences of SEQ ID NO:l that are responsible for biological 
activity... With respect to claims 3 and 4, the Examiner stated that applicants contemplates the use of 
these sequences in addition to the whole SEQ ID NO:l, in a composition directed (as indicated by the 
specification) in cancer and immune disorders which in effect encompass a method of treatment for 
which the instant application is not enabled. The specification does not teach how to use or what 
portions of SEQ ID NO:l to use in order to be "effective" in treatment. 
Applicants Response 

The amendment to claim 1 deleting the recitation of a "biologically active fragment" have been 
previously noted. Claim 3 has been further amended to delete the phrase "an effective amount of from 
the claim. Therefore, the claim as amended no longer implies a composition used in any therapeutic 
treatment. The use of compositions comprising proteins and "acceptable excipients" for simple storage 
and stability purposes is well within the "relative skill of those in the art" and "the state of the prior art" 
in accordance with In re Wands , and is thus fully enabled for that purpose. 
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With these amendments and remarks, applicants submit that claims 1-4 are fully enabled and 

therefore request withdrawal of the rejection of claims 1-4 under 35 U.S.C. § 112, first paragraph. 
35 U.S.C § 112, Second Paragraph, Rejection of Claims 3 and 4 

The Examiner has rejected claims 3 and 4 under 35 U.S.C. § 1 12, second paragraph, as being 
indefinite for failing to particularly point out and distinctly claim the subject matter which applicant 
regards as the invention. In particular, the Examiner stated, the term"effective" in claim 3 is a relative 
term which renders the claims indefinite. 

The term "effective" as been deleted from claim 3, and claim 4 no longer depends from a claim 
reciting the term. Withdrawal of the rejection of claims under 35 U.S.C. § 1 12, second paragraph is 
therefore requested. 

3 5 U.S.C. § 102(b), Rejection of Claims 1-3 

The Examiner has rejected claims 1-3 under 35 U.S.C. § 102(b) as being anticipated by the 
1993 Sigma Chemical Catalog product P 2254, and alternatively by Mochizuki et al. (US Patent 
5,395,916; March 1995). Sigma Chemical Catalog product P 2254 is a poly-proline which has 100% 
identity to the fragment of poly proline amino acid residues 270-274 of SEQ ID NO:l. Mockizuki et 
al. teaches a pharmaceutical composition containing an amino acid sequence of poly-proline that is 
identical to the fragment of poly proline amino acid residues 270-274 of SEQ ID NO:l and is 
comprised in an effective amount for a pharmaceutical composition (claim 3). 
Applicants Response 

Claim 1 has been amended to recite only fragments of SEQ ID NO:l having at least 15 
contiguous residues. Claim 3 has further been amended to delete any reference to a "pharmaceutical" 
composition. Neither the Sigma Catalog or Mockizuki et al. teach SEQ ID NO:l, a variant of SEQ ID 
NO:l having at least 95% sequence identity to SEQ ID NO:l, or a fragment of SEQ ID NO:l having 
at least 15 contiguous residues of SEQ ID NO:l. Neither do either of these references teach a 
"composition" comprising any of these sequences. Withdrawal of the rejection of claims 1-3 under 35 
U.S.C. § 102(a) is therefore requested. 
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CONCLUSION 

In light of the above amendments and remarks, Applicants submit that the present application is 
fully in condition for allowance, and request that the Examiner withdraw the outstanding 
objections/rejections. Early notice to that effect is earnestly solicited. Applicants further request that, 
upon allowance of claim 1, claims 15-16 and 35-37 be rejoined and examined as process claims that 
depend from and are of the same scope as product claim 1 in accordance with In re Ochiai and the 
MPEP§ 821.04. 

If the Examiner contemplates other action, or if a telephone conference would expedite 
allowance of the claims, Applicants invite the Examiner to contact the undersigned at the number 
listed below. 

Applicants believe that no fee is due with this communication. However, if the USPTO 
determines that a fee is due, the Commissioner is hereby authorized to charge Deposit Account 
No. 09-0108. 



Date: 




Respectfully submitted, 
EsfCYTE CORPORATION 




David G. Streeter, Ph.D. 
Reg. No. 43,168 

Direct Dial Telephone: (650) 845-5741 



Customer No.: 27904 
3160 Porter Drive 
Palo Alto, California 94304 
Phone: (650) 855-0555 
Fax: (650) 849-8886 



Attachment(s): Brenner et al., Proc. Natl. Acad. Sci. USA (1998) 95:6073-6078 (Exhibit A) 
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piy. onry of homologs with 20-25% identity 



2SO0 



Proc.Satl.Acad.Sc, USA95<199S> 



60 




'denury (us.ng lhe meaiure g^W; aceordrng I0 their 
the number of these purs found b» >t£ 1 R " ed re P ons •"*«•* 
(ssearch wirt E-values) «T?» ETO ,da,ab """ re "»¥ method 
proicns with <40* i de n, 1IV a „d 7?* M>B "^d»>»base con.„ m 
structurally identified homoloo ?n - w" °" ,lus ? ra P h - ">°" 
•remelv far in sequence andh,£ %S^" 
alignments may be .naccurate JdSii^ '^""^ No,e lh '" «* 
regions show ,(,« ssEARCH L^° a v ),,,0,, ' leve ' 10 ' i <l"''tv. Filled 
25% o, more , demi ^ " b ^;*^;« reia,io™ hips ,„„ nlve 

Conseouently. the great seouenei d ! ,harp,v belo » =S*. 

.dentified evolut.onarv rela'S- effVf "? -? f , mM * 

panwue comparison to detect them'" e lhe aMiry of 

are detected and only 10% of those with is-?ftc- w - 
These results show that statistical .1 15 " 20%can be 'o«nd. 
proteins whose identity "4SSSh,^ find rela,ed 
of the method is rutric ed bv ,t - ,ow: j! owever - "»e power 
protein sequences " ' he grcat d,v «««nce of many 

After completion of this work- - * ~ 
BUST was released: SwSV, ull^'^ of P aiw « 
ments. like wu-BtAST-^7 d iiJi. "VPPons gapped align- 
initial tests on biIstc,; Ll W,,h sum sta, «'« Our 

E-values are,e.iSn d Ta ito^ZT^ ,haI iU 
was substantially better than tnaf of? eleC ''° n of homol °P 
qu-te equal to tha, of SaPPed ^ but not 



CONCLUSION 



I^EPO Cmoff 




and „o usin^tat s. ca, SC or« Com P'»^ masked 

^nn^Bfu^S^™^'"'"!-.' -suits. Our 

estimates of the sienifican^ «f u g fa,rlv acc "rate 



Coverage. ai jr c EPQ 



• •«« .re from „ rge d atab« searches w,, h genome pr0Ie ,- 
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0.01 o.l 1 

Error* Par Ouary 

•J'.-'. 4 ; *? U * b J Uly of " 4, » ,ie »l »"rcs in pdwod-b: Each line shows 

rate for a different program. E-vaiues are reported for ssearch and 
facta, whereas P-values are shown for blast and w^SSSflf thl 
scoring were perfect, then the number of errors w o^wmd the 
E-values would be the same. a, indicated bv th Tup^r boW fa? 

and^"* ' ndie " ed by ,he ,ower bold " ne > E-wtoWfX 

EEES^J^T* " re '?° Wn '° have good »?'«"™< *«h EPO bm 
underestimate the significance slightlv. blast and wu-blact-T 

^°The e ? e ;:i! h f ,he dCgree ° f eMWeri,ion linden "u^n Z 

£X ». d ! fferenee « of bomologs detected. TTtis graph 

could be used to roughly calibrate the rel,abil,,y of a given st.S 

ignored in previous tests but is essential for the straightforward 
FurXrT* mte * reta,i ° n of sequence compariS, r«u L 

SUr? h PfOV J f * C ' Mr indiMlion of ,he confidence tna 
shou Id be ascribed to each match. Indeed, the EPQ me Jure 
should approxtmate the expectation value reported bv 2£ 
base searching programs, if the programs" estimates are acct 

xJOZ f^r""? of l ScoriB « Sel »«»«- All of the programs 
tested could prov,de three fundamental types of scores X 

••s m ,h h ' ™« second is a "raw" or 

he Smith-Waterman algorithm and is computed bv summine 

mentlnd^'h? ^ KOt " f ° r each P"* on in thT.l£! 
ment and subtracting gap penalties. In blast, a measure 

Sequence Compam on Alpomnma (POB40D4) 




on oie 
Conaraoa 



Proc Natl - *<*d- Set. USA 95 (1998) 
L'oVela^T 6 b * ,tad in, ° bitt - ™ rt * * 

JtSZZ? I'" 1 ""- ' n,0Ugh " has been l0 "S established that 
EEST i eM,ty " ap °° T measure < 35 >- «h«e is a common 

be iLe7« "k^,*?? indica,ed "»i =5* identic <£. 
SLr databases have grown, so have the possibilities for 

Si ML' d r t,ty °T er con siderable alismed regions 
s^cT ncorr« ^ ,de K t,Iy - nW and ,he scores for 

ctoaJ re«ir^ hB "^""y "«>' signiHcant. The pr ~ 
opal reasons percentage identity does so poorlv seem to be 

vative or radical nature of residue substitutions 
From the PDBWD-b analysis in Fie 3 we leam that -me- 

struen y ceV Bn fe,iabie ^° ld ^V^Z'o^or 
sequence alignments of at least 150 residues Because orie 

.TL^^ ° f Pr01e f S hu identitv oler 62 re, duel 
„Kh rtv f0f •""•"»««• «o »e at least 70 r«id u « 

5S?Sr5£u % " a , reasonaWe 'h'eshold. for a datab« 
or this panicular sat and composition 
At a given reliability, scores based on percentage idemirv 

sSca.^co a 2 a ?; on of ,he distani "^syfaSr? 

fht Sw 8 ' W 0 ? e measur es the percentage identity to 
the aligned regions without consideraiion of alinment l en «h 

,^n,?r. k . P e ? Ua,,0n lm P rove$ the of percentage 
identity, but even this measure can fmd onlv 4% of all kno^ 

SSL" ! f* EPQ ' ,n Sh °"- Percema ? e 'denl d,S 

Scorn S m f t h°w m . ea$Ured in 3 " quenee 
,hiT Smith-Waterman raw scores perform better 

„ 0 , a h P 1 ! r K en,aSe ' demity (Fi8 1 >• bm ,n - s "''n8 (7) provided „o 
notable benefit in our analysis. It is necessary to be verv 

when using either raw or bit scores because a 20 wL^eln 
i, if dfff Ofe , C0Uld r ,d 8 tenf0,d di « ere «e «n EPQ. Howe er 
el abtK L° »PP~P™i'e thresholds because the 

matched and the size of the database. Raw score thr«h«M. 

S„ a t7s, a S? d by T tiX and ^P"ame,« 6 
Statistical Scores. Statistical scores were introduced oartlv 

crZJ'T ,hC Pr ° blemS ,haI ar,se fro "> sc"es P T^ is 
»2£ 6 Pr ° VideS ,he best d.scrimina«ion b«ween 
homologous proteins and those which are unrelated Mos" 

Sacamnc* Companion Algorithm* (PDBMD-B) 
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superfamilies. Pearson found that modern matrices and "l„ 
scahng" of raw scores improve results cons£e«b£ H? , 

ssss,'^ <»>■ £ 

which are Sf evaJu^eS ? n q " e " ce prison methods' 
methods cre«SVSefJd i.^ n u. enCy ° f dau » d 
example, that net mlS wojfbe^^ ^ for 
identifying homologs missed ^ older ^"^i" e ° rTec,,y 
immunoglobulin variZe 7«h . P"*™"- For instance, 
homologous . taS?t^!^! t are c '«riy 

other pir superfamilies (16) average of 1.6 

^xSr^ 

dependent threshold of peS a « WeS^s ^ *}?*^ 
was the hssp equation; it states tha^feSS" - " 

statistical measure fie E v»k^ SC , 0nn « comcs from 

implemented analytically fa I thel^IJJV KOmg wu 
Karlin and Altschul s£ti£« "qfffl EPS.*"! ,he 
proaches have been recently added^o pAt? em P ,r,cal a P" 
addition to being heralded as a rlfi a h.I1f and , SSEARCH «" 
significantly ^^^i"5f te Ul , ?S , |l 0, re «8»«mg 
lability of statistical seows-'i. , » • raa,hema «Ml trac- 

sequencesTt^aySJ 8 ^ HT* *>*»" 
biological sequent (* 2#L££- ft>Und wi,hin 
real homologs S alSufh » ' OUlly ** 001 contain a «™ 
gested that S^^SX^SV^T^ haVe 
28). there have te«T »!f?. • d ,0 rank m «ch« (24. 25. 

superior. antmme ,he de 8"e "> which such rankings are 
very similar 

d^ree - «^ 
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^S5%rsSU£& "? aun infonMtion 

thescoFdat,bas?(7Sh.v1^H U,,0ni ' y » 

of sequence comDaSo?^!? M,M,e 1,16 rmance 

whoserela^Sar^kn^? ^r 0 ," pTOtein " ouen « 
uses structure infwf, ^ con «dentl>. The scop database 

These superfam° j£ «ch I'v"™ 6 ? ttnam *™ously. 
lins. would be reeomwJ?. . *L obm$ or the immunoglobu. 
biological c^uTry^ 

liaritv. 7 ^ ine UcJc of hl *h sequence sinv 

databases. One tr!£m%hl?£ ( ? ) (3 . 0) and CTMled rwo 
identical to ^yo^wi^^T^ were aJI <*>% 
identical. ThSjS£*T£ <40 * 
protein domains in scop by their ouafitvlnM C ? 1 *° nin 8 4,1 
highest quality domain wis «ele«5 £ 8 ^ 

database and removed from tS lL "^"i' 0 " in tte 
(and discarded) were all Zhlr * ^° rcmoved f «m the list 
level of identity *r,h?se£eST^ ab ^ e ****** 
repeated until the lh7was TmL 5? m - ^ process wa * 
contains 1.323 domains whic? £7' PDB40D - b da,abase 
distant retaifc»S^? , -W b , a ! e 9 ' 044 ordMed P»» of 
pairs. In roSSaE "J^? ^ ^'f- 006 "dered 
ships, representing ™ ^fTn l""* 1 ? have 53 - 988 relation- 
of sequence can ach^ve ^ ""^V regions 

masked in both datable* E ^ ^ SC ° feS - *° theSe Were 
(27) using recomtnX^mee^ 
used m this paper are avail.hiT fll i , ^ databases 
sss/. and da°ab«es S d 1™I ? m htt P : /^ s »nford.edu/ 
may be found .TtatSSL^e SS, 0 "™" 1 Vmi ° n ° f SCOP 

Analyses from ^SKSJSS*'**'- 
PDBiod-b focuses on distantf™^* 8enera " v c onsistent. but 
heavy overrepresenta !on n ,t " pr ° ,ems and reduc « ««e 
families (31. ff^",*^" » ^ «" — b« of 
improves evaluations of statistic pJZ th l more «<1«ences) 
wise, the distant homolS ffitt^T.^T n0,ed Mn «- 
Althoueh the precise numh,« " J C are fr ° m PD Wor>-B. 

categories of tests. First', using jus! a smpli « majof 
•son algorithm a. a t.me w? ivla! '"^"ence compar- 
d fferem scoring schemed ZoT^^[^T, ° f 
of scoring procedures indurfin. .* ^^ed the reliability 
of statistical scoring Co f^e^ t"™ ° ! ,he ^ 
son algorithms (using the oo^T "V™ com P ari - 
mme their rela ,ve TrformanTe P^l^^ 10 de,er - 
distribution of homologs Z IZnJnZSrL^T* 
sequence comparison to tZSSfi^^?*"?* 
used the databases of structural idenHfi.H k ^ a " a,yse$ 
new assessment criterion V ,den,,f,ed homologs and a 

package. iSftw'^"^**" the ™ta 
ssearch implementation of SmiK FASTA and ,he 

-p/ R -i ( 7 nd 7 rr Sa£ 
(com^:;™^^ 

associated scores wh'ch w er e ,eflMences wi,h 

scores, from bes to worTt T^7h, .° n ,h . e baSis of ,heir 
»■ io worst. The ideal method would have 



